Reuters TRC2-
financial
[CLS]Token 1 Token 2[MASK][SEP]
[CLS]Token 1 Token 2[MASK][SEP]
[CLS]Token 1 Token 2[MASK][SEP]
[CLS]Token 1 Token 2[MASK][SEP]
Dense
Masked LM prediction
Dense
[is next sentence] prediction
BookCorpus +
Wikipedia
Token 1 Token 2[MASK][SEP]
Embeddings [CLS]
Token 1 Token 2[MASK][SEP]
Token 1 Token 2[MASK][SEP]
Token 1 Token 2[MASK][SEP]
Encoder 1
[CLS]
Encoder 2
[CLS]
Encoder 12
[CLS]
Dense
Masked LM prediction
Dense
[is next sentence] prediction
Language model on general corpus
[CLS]Token 1 Token 2Token k[SEP]
[CLS]Token 1 Token 2Token k[SEP]
[CLS]Token 1 Token 2Token k[SEP]
[CLS]Token 1 Token 2Token k[SEP]
Dense
Sentiment prediction
Classification model on financial sentiment dataset
Financial
Phrasebank
Language model on financial corpus
F
i
gu
re 1
:
O
v
er
v
ie
w
of
p
re
-
t
rai
n
i
n
g
, f
u
r
t
h
er
p
re
-
t
rai
n
i
n
g
a
n
d
c
la
ss
ifi
c
a
t
io
n
fi
n
e
-
tu
n
i
n
g
4.2.3
F
i
Q
A
S
e
nt
i
m
e
nt.
F
i
Q
A
[
15
]
i
s
a
d
ata
s
et
t
h
at
w
a
s c
r
eate
d f
o
r
WWW
’
18
conf
e
r
e
nc
e
f
inancia
l
opinion minin
g
and qu
e
stion an
-
s
w
e
r
in
g
cha
ll
e
n
g
e
6
.
W
e
us
e
th
e
data fo
r
Tas
k
1,
w
hich inc
l
ud
e
s
1,174
f
i
n
a
nc
i
a
l
n
e
ws h
ead
l
i
n
e
s
a
n
d
t
w
eet
s w
i
t
h
t
h
e
i
r c
o
rr
e
sp
o
n
d
i
ng
s
e
n
t
i
m
e
n
t
sco
r
e.
Un
li
k
e
F
i
n
a
nc
i
a
l
Ph
r
a
s
eba
n
k
,
t
h
e
ta
rg
et
s fo
r
t
h
i
s
da
t
as
et
s a
r
e
con
t
i
nuous
r
an
g
i
n
g
bet
w
ee
n
[
−
1
,
1
]
w
i
t
h
1
be
i
n
g
t
h
e
m
o
s
t
p
o
s
i
t
i
v
e
.
E
a
ch
e
x
a
mp
l
e
a
l
s
o
h
a
s
i
nf
o
rm
at
i
o
n r
e
g
a
r
d
i
ng wh
i
ch
f
inancia
l
e
ntit
y
is ta
rg
e
t
e
d in th
e
s
e
nt
e
nc
e.
W
e
do
10
-
fo
l
d c
r
oss
v
a
l
i
d
at
i
o
n f
o
r
e
v
a
l
u
at
i
o
n
o
f
t
h
e
m
o
d
e
l
f
o
r
t
h
i
s d
ata
s
et.
4.3
B
a
s
eli
n
e Me
t
h
od
s
F
o
r
c
o
n
t
r
a
s
t
i
v
e
e
x
p
e
r
i
m
e
n
t
s
,
w
e
c
o
ns
i
d
e
r
ba
s
e
li
n
e
s w
i
t
h
t
h
r
ee
d
i
f
-
f
e
r
e
nt m
e
thods
:
L
S
TM c
l
assi
f
i
e
r
w
ith
G
Lo
V
e
e
m
be
ddin
g
s
,
L
S
TM
c
l
ass
i
f
i
e
r
w
i
t
h
E
LMo
e
m
be
dd
i
n
g
s and ULM
F
i
t
c
l
ass
i
f
i
e
r
.
I
t
shou
l
d
be
n
oted
t
h
at
t
h
e
s
e
ba
s
e
l
i
n
e
m
et
h
od
s
a
r
e
n
ot
e
x
p
e
r
i
m
e
n
ted
w
i
t
h
a
s
t
h
o
r
o
ugh
ly
a
s w
e
d
i
d
w
i
t
h
BER
T
.
Th
e
r
e
f
o
r
e
t
h
e
r
e
su
l
t
s sh
o
u
l
d
n
ot
be
i
n
te
rpr
eted
a
s
def
i
n
i
t
i
v
e
c
o
nc
l
us
i
o
ns
o
f
o
n
e
m
et
h
od
be
i
ng
bette
r
.
4.3.1
L
S
T
M
c
l
a
ss
i
f
i
e
r
s
.
W
e
i
mp
l
e
m
e
n
t
t
w
o
c
l
a
ss
i
f
i
e
rs us
i
ng
b
i
d
i
r
e
c
-
t
i
o
n
a
l
LSTM m
o
d
e
l
s
.
I
n
bot
h
o
f
t
h
e
m
,
a
h
i
dd
e
n s
i
z
e
o
f
128
i
s us
e
d
,
w
ith th
e
l
ast hidd
e
n stat
e
si
z
e
be
in
g
256
du
e
to
b
idi
r
e
ctiona
l
it
y
.
A fu
lly
conn
e
ct
e
d f
ee
d
-
fo
rw
a
r
d
l
a
y
e
r
maps th
e
l
ast hidd
e
n stat
e
to a
v
e
cto
r
of th
r
ee,
r
e
p
r
e
s
e
ntin
g
l
i
k
e
l
ihood of th
r
ee
l
a
be
l
s
.
Th
e
d
i
ffe
r
e
nc
e
bet
w
ee
n
t
w
o
m
ode
l
s
i
s
t
h
at
o
n
e
us
e
s
G
L
o
V
e
e
m
bedd
i
ngs
,
wh
il
e
t
h
e
o
t
h
e
r
us
e
s
E
LMo
e
m
be
dd
i
n
g
s
.
A d
r
opou
t
p
r
o
b
a
b
ili
t
y
of
0
.
3
a
n
d
a
l
ea
rn
i
ng r
ate
o
f
3e
-
5
i
s us
ed
i
n
bot
h m
ode
l
s
.
W
e
t
r
a
i
n
t
h
e
m
un
t
i
l
t
h
e
r
e
i
s n
o
i
mp
r
o
v
e
m
e
n
t
i
n
v
a
l
i
d
at
i
o
n
l
o
ss f
o
r
10
e
p
o
chs
.
4.3.2
ULMF
i
t.
As
i
t
was
e
x
p
l
a
i
n
e
d
i
n s
e
c
t
i
on
3.1.3,
c
l
ass
i
f
ica
t
i
on
w
i
t
h ULM
F
i
t
cons
i
s
t
s of
t
h
r
ee
s
te
ps
.
Th
e
f
i
r
s
t
s
te
p of p
r
e
-
t
r
a
i
n
i
n
g
a
l
a
n
g
u
a
g
e
m
o
d
e
l
i
s
a
l
r
ea
d
y
d
o
n
e
a
nd
t
h
e
p
r
e
-
t
r
a
i
n
e
d w
e
i
g
h
t
s
a
r
e
r
e
l
e
as
e
d
b
y
H
o
w
a
r
d and
R
ud
e
r
(
2018
)
.
W
e
f
i
r
st fu
r
th
e
r
p
r
e
-
t
r
ain
A
W
D-
LS
T
M
l
a
ngu
a
g
e
m
ode
l
o
n
T
R
C
2
-
f
i
n
a
nc
i
a
l
c
o
rpus f
o
r
3
e
p
o
chs
.
Aft
e
r
that
,
w
e
f
in
e
-
tun
e
th
e
mod
e
l
fo
r
c
l
assi
f
ication on
F
inancia
l
6
D
a
t
a can
be
found h
e
r
e:
h
tt
ps
:
//
s
i
te
s
.
goog
l
e.
com
/
v
i
e
w
/
f
iqa
/
hom
e
Phr
a
s
e
B
a
n
k
data
s
et,
b
y
add
i
ng
a
fu
lly-
c
o
nn
e
c
ted
l
a
y
e
r
to
t
h
e
o
u
t
pu
t
o
f p
r
e
-
t
r
a
i
n
e
d
l
a
ngu
a
g
e
m
o
d
e
l
.
4.4E
v
al
u
a
t
io
n
Me
t
ri
cs
F
o
r
e
v
a
l
uation of c
l
assi
f
ication mod
e
l
s
,
w
e
us
e
th
r
ee
m
e
t
r
ics
:
Ac
-
cu
r
a
c
y
,
c
r
o
ss
e
n
t
r
o
p
y
l
o
ss
a
nd m
a
c
r
o
F
1
a
v
e
r
a
g
e.
W
e
w
e
i
gh
t
c
r
o
ss
e
n
t
r
o
p
y
l
o
ss w
i
t
h s
q
u
a
r
e
r
oot
o
f
i
n
v
e
r
s
e
f
r
eq
u
e
nc
y
r
ate.
F
o
r
e
x
a
m
-
p
l
e
i
f
a
l
abe
l
c
o
ns
t
i
t
u
te
s
25%
o
f
t
h
e
a
ll
e
x
a
mp
l
e
s
,
w
e
w
e
i
gh
t
t
h
e
l
o
ss
att
r
i
b
u
ted
to
t
h
at
l
abe
l
b
y
2.
M
a
cr
o
F
1
a
v
e
r
a
g
e
c
a
l
cu
l
ate
s F
1
sc
o
r
e
s
f
o
r
ea
ch
o
f
t
h
e
c
l
a
ss
e
s
a
n
d
t
h
e
n
ta
k
e
s
t
h
e
a
v
e
r
a
g
e
o
f
t
h
e
m
.
S
i
nc
e
o
ur
data
,
F
inancia
l
P
h
r
as
e
B
an
k
suff
e
r
s f
r
om
l
a
be
l
im
b
a
l
anc
e
(a
l
most
60%
o
f
a
ll
s
e
n
te
nc
e
s
a
r
e
n
e
u
t
r
a
l
)
,
t
h
i
s g
i
v
e
s
a
n
ot
h
e
r g
ood
m
ea
sur
e
o
f
t
h
e
c
l
a
ss
i
f
ic
at
i
o
n p
e
rf
o
rm
a
nc
e.
F
o
r
e
v
a
l
u
at
i
o
n
o
f r
e
gr
e
ss
i
o
n m
ode
l
,
w
e
r
e
po
r
t m
e
an squa
r
e
d
e
rr
o
r
and R
2
,
as th
e
s
e
a
r
e
b
oth standa
r
d
a
nd
a
l
s
o
r
e
p
o
r
te
d
b
y
t
h
e
s
tate
-
o
f
-
t
h
e
-
a
r
t
p
a
p
e
r
s f
o
r
F
i
Q
A
d
ata
s
et.
4.5I
mp
le
m
e
n
t
a
t
io
n
D
e
t
ail
s
F
o
r
ou
r
imp
l
e
m
e
ntation
B
E
R
T
,
w
e
us
e
a d
r
opout p
r
o
b
a
b
i
l
it
y
of
p
=
0
.
1,
w
a
r
m
-
up p
r
opo
r
tion of
0
.
2,
ma
x
imum s
e
qu
e
nc
e
l
e
n
g
th of
64
to
k
e
ns
,
a
l
e
a
r
nin
g
r
at
e
of
2
e
−
5
and a mini
-
b
atch si
z
e
of
64.
W
e
t
r
ain th
e
mod
e
l
fo
r
6
e
pochs
,
e
v
a
l
uat
e
on th
e
v
a
l
idation s
et
a
nd
ch
oo
s
e
t
h
e
be
s
t
o
n
e.
F
o
r
d
i
sc
r
i
m
i
n
at
i
v
e
f
in
e
-
t
un
i
n
g
w
e
s
et
th
e
disc
r
imination
r
at
e
as
0.85.
W
e
sta
r
t t
r
ainin
g
w
ith on
ly
th
e
c
l
a
ss
i
f
i
c
at
i
o
n
l
a
y
e
r unfr
o
z
e
n
,
a
f
te
r
ea
ch
t
h
i
r
d
o
f
a
t
r
a
i
n
i
ng
e
p
o
ch w
e
unf
r
ee
z
e
t
h
e
n
e
x
t
l
a
y
e
r
.
An Ama
z
on p
2.
x
l
a
rg
e
E
C
2
i
ns
t
anc
e
w
i
t
h
o
n
e
N
V
I
D
IA
K
80
G
PU
,
4
v
C
PUs
a
n
d
64
G
i
B
o
f h
o
s
t
m
e
m
o
r
y
i
s us
ed
to
t
r
a
i
n
t
h
e
m
o
d
e
l
s
.
5EX
P
ERIME
N
TAL RE
SU
LT
S
(R
Q
1
&
R
Q
2)
Th
e
r
e
su
l
t
s
o
f F
i
n
BER
T
,
t
h
e
ba
s
e
li
n
e
m
et
h
o
ds
a
nd s
tate
-
o
f
-
t
h
e
-
a
r
t
o
n F
i
n
a
nc
i
a
l
Phr
a
s
e
B
a
n
k
data
s
et
c
l
a
ss
i
f
ic
at
i
o
n
ta
s
k
c
a
n
be
s
ee
n
o
n
tab
l
e
2
.
W
e
pr
e
s
e
n
t
t
h
e
r
e
su
l
t
o
n
bot
h
t
h
e
wh
o
l
e
data
s
et
a
n
d
su
b
s
et
w
i
t
h
100%
a
nn
otato
r
a
g
r
ee
m
e
n
t.
5